|
I have just stated writing this Jul 7 2011, Will keep updating when i find time.
OCR PROCESSING STEPS
1. DOCUMENT INPUT
2. IMAGE PROCESSING
3. DOCUMENT & LAYOUT ANALYSIS
4. RECOGNITION
4.1 Character & Word Recognition
4.1.1 Character Recognition or Optical character recognition
4.1.1.1 Image to Vector
4.1.1.2 Pattern Recognition Algorithm
4.1.1.2.1 nearest neighbor algorithm
4.1.1.2.3 back-propagation (or backprop)
4.1.2 Word Recognition
4.2 Font Types Recognition
5. VERIFICATION & USERE INTERACTION
6. EXPORT DOCUMENT OUTPUT
4.1.1 Character Recognition or Optical character recognition
OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text.
Often abbreviated OCR, optical character recognition refers to the branch of computer science that involves reading text from paper and translating the images into a form that the computer can manipulate
An OCR system enables you to take a book or a magazine article, feed it directly into an electronic computer file, and then edit the file using a word processor.
S & 5 -> Error Accptable
S & M -> Not Acceptbale
Reason.
Humans observe strokes and the relations between them, while algorithms measure anything from Transformation Ring Projections2 of a character to the Fourier Transform of the Horizontal-Vertical Projections3 of a character. These methods do work and are often computationally efficient, but they make the computer see letters through a decidedly non-human set of eyes.
4.1.1.1
If all you want to do is recognize tiny characters of the size 5x5 or 5x7 or some such low resolution then just list the matrix as a vector so if your character is this:
11111
10000
11111
10000
11111
just list it as the vector:
1111110000111111000011111
The above is nice and simple but if your figure is larger you need to do more work, you need to do some pre-processing to cut down on size of the vector and help generalization
|